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NETWORK GRANGER CAUSALITY WITH INHERENT 
GROUPING STRUCTURE 

By Sumanta Basu*, Ali Shojaie"!" and George Michailidis* 
University of Michigan* and University of Washington) 

The problem of estimating high-dimensional network models arises 
naturally in the analysis of many physical, biological and socio-economic 
systems. Examples include stock price fluctuations in financial mar- 
kets and gene regulatory networks representing effects of regulators 
(transcription factors) on regulated genes in genetics. We aim to learn 
the structure of the network over time employing the framework of 
Granger causal models under the assumptions of sparsity of its edges 
and inherent grouping structure among its nodes. We introduce a 
thresholded variant of the Group Lasso estimator for discovering 
Granger causal interactions among the nodes of the network. Asymp- 
totic results on the consistency of the new estimation procedure are 
developed. The performance of the proposed methodology is assessed 
through an extensive set of simulation studies and comparisons with 
existing techniques. 

1. Introduction. Granger causality [Granger, 1969] provides a statisti- 
cal framework for determining whether a time series X is useful in forecasting 
another one Y , through a series of statistical tests. It has found wide ap- 
plicability in economics, including testing relationships between money and 
income [Sims, 1972], government spending and taxes on economic output 
[Blanchard and Perotti, 2002], stock price and volume [Hiemstra and Jones, 
1994], etc. 

Extensions involving multiple time series can be handled through analy- 
sis of vector autoregressive processes (VAR) [Liitkepohl, 2005], which pro- 
vide a convenient framework for analysis of relationships amongst multiple 
variables. As a result, the Granger causality framework has recently found 
diverse applications in biological sciences including genetics, bioinformatics 
and neurosciences to understand the structure of gene regulation, protein- 
protein interactions and brain circuitry, respectively. In these applications, 
the main goal is to reconstruct a network of interactions amongst the entities 
involved based on time course data. 
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It should be noted that the concept of Granger causahty is based on asso- 
ciations between times series, and only under very stringent conditions, true 
causal relationships can be inferred [Pearl, 2000]. Nonetheless, this frame- 
work provides a powerful tool for understanding the interactions among 
random variables based on time course data. 

Network Granger causality (NGC) extends the notion of Granger causal- 
ity among two variables to a wider class of p variables. More generally if 
Xl, . . . , Xp are p stationary time series, with X* = {Xl, . . . , X*)', we con- 
sider the class of models 

(1.1) = A^XT-i + . . . + yl-^X^-d + 

where d the order of the VAR model is allowed to be unknown and the 
innovation process satisfies e'^ ~ -^(0, a^I). We call A^, . . . ,A'^ the adjacency 
matrices from lags 1, . . . ,d. In this model, Xj is said to be Granger causal 
for Xf if Ajj is statistically significant. In this case, there exists an edge 
Xj — )■ Xf in the underlying network model comprising of T x p nodes (see 
Figure 1). Note that the presence of ordering between the variables in this 



VAR(2) model with two non-overlapping groups 
T = 4, d=2, p=6, G=2 




Fig 1: An Example of a network Granger model with two non-overlapping 
groups observed over T = 4 time points 



network, due to their temporal structure, simplifies significantly the network 
estimation problem [Shojaie and Michailidis, 2010a]. Nevertheless, one still 
has to deal with estimating a high-dimensional network (e.g. hundreds of 
genes) from a limited number of samples. 
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Estimation of NGC models often arises in the analysis of large panel 
data in econometrics, where one is interested to understand the temporal 
relationship of several economic variables observed over time across a panel 
of subjects. Such an example is presented in Section 6.1 that examines the 
structure of the balance sheets of the 50 largest US banks by size, over 9 
quarterly periods. The nature of high-dimensionality in this problem comes 
from both estimation of coefficients for the adjacency matrices A^, . . . , A"^, 
but also from the fact that the order of the time series d is often unknown. 
Thus, in practice, one must either "guess" the order of the time series (often 
times, it is assumed that the data is generated from a VAR(l) model, which 
can result in significant loss of information) , or include all of the past time 
points, resulting in significant increase in the number of variables in cases 
where d <C T. Thus, efficient estimation of the order of the time series 
becomes crucial. 

Recent work of Fujita et al. [2007] and Lozano et al. [2009] employed NGC 
models coupled with penalized ii regression methods to learn gene regu- 
latory mechanisms from time course microarray data. Specifically, Lozano 
et al. [2009] proposed to group all the past observations, using a variant of 
group lasso penalty, in order to construct a relatively simple Granger net- 
work model. This penalty takes into account the average effect of the covari- 
ates over different time lags and connects Granger causality to this average 
effect being significant. However, it suffers from significant loss of informa- 
tion and makes the consistent estimation of the signs of the edges difficult 
(due to averaging). Shojaie and Michailidis [2010b] proposed a truncating 
lasso approach by introducing a truncation factor in the penalty term, which 
strongly penalizes the edges from a particular time lag, if it corresponds to 
a highly sparse adjacency matrix. 

Despite recent use of NGC in high dimensional settings, theoretical prop- 
erties of the resulting estimators have not been fully investigated. For ex- 
ample, Lozano et al. [2009] and Shojaie and Michailidis [2010b] discuss con- 
sistency of the resulting estimators, but neither address in depth selection 
consistency properties nor do they examine under what vector autoregres- 
sive structures the obtained results hold. Hence, there is significant room for 
theoretical work in understanding theoretically the performance of penalized 
estimators in NGC models. 

In addition, in many applications structural information about the vari- 
ables exists, which could improve the estimation of Granger causal mod- 
els. For example, genes can be naturally grouped according to their func- 
tion or chromosomal location, stocks according to their industry sectors, 
assets/liabilities according to their class, etc. This information can be incor- 
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porated to the Granger causality framework through a group lasso penalty. 
If the group specification is correct it enables estimation of denser networks 
with limited sample sizes [Bach, 2008, Huang and Zhang, 2010, Lounici 
et al., 2011]. However, the group lasso penalty can achieve model selection 
consistency only at a group level. In other words, if the groups are misspeci- 
fied, this procedure can not perform within group variable selection [Huang 
et al., 2009], an important feature in many applications. To address this is- 
sue, we propose a new notion of "direction consistency" , and use this notion 
to introduce a thresholded variant of group lasso for NGC models. 

In this paper, we develop a general framework that accommodates differ- 
ent variants of group lasso penalties for NGC models. It allows for the simul- 
taneous estimation of the order of the times series and the Granger causal 
effects; further, it allows for variable selection even when the groups are mis- 
specified. In summary, the key contributions of this work are: (i) investigate 
sufficient conditions that explicitly take into consideration the structure of 
the VAR((i) model to establish norm and variable selection consistency, (ii) 
introduce the novel notion of direction consistency, which generalizes the 
concept of sign consistency, and use it to establish variable selection consis- 
tency of group lasso estimates with misspecified group structures, and (iii) 
use the latter notion to introduce an easy to compute thresholded variant 
of group lasso, that performs within group variable selection in addition 
to group sparsity pattern selection. Application of the proposed framework 
to data from banks' balance sheets and temporal regulatory mechanisms 
related to T-cell activation indicates that the resulting estimates provide 
novel insight into interactions among components of the system, as well as 
improved prediction of future values of the variables. 

The rest of the paper is organized as follows. In Section 2, we formulate 
the group NGC estimate and its variants. We explain their major advantages 
and briefly discuss the implementation procedure. Section 3 describes the 
notation used and introduces the notion of direction consistency, and dis- 
cusses different assumptions required for the consistency of NGC estimates. 
The theoretical properties of group NGC estimates are discussed in Sec- 
tion 4, where non-asymptotic bounds for their norm and variable selection 
consistency are established. Section 5 reports the results of numerical ex- 
periments, under different settings, and Section 6 applies the different NGC 
methods on two real datasets. 



2. Model and Framework. 
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2.1. Notation. Consider a VAR model 

(2.1) X^ = ^X^-i + ... + A^X^-'^ + e^ 

pxl pxp 

observed over T time points t = 1,...,T, with innovation process ~ 
A^(0, cT^Ipxp)- The index set of the variables Np = {1,2, . . . ,p} can be parti- 
tioned into G non-overlapping groups Qg, i.e., Np = U^^^Qg and QgHQg' = (j) 

^ 1/^ I /-\ f~\Tr\ ficrf -{-\~\ /-\ rj r~i 



\i g ^ g' and where kg = \Qg\ denotes the size of the g^^ group with 



kynax — max hq . 
1<9<G 

For any matrix A, we denote the i^^ row by Ai-, j^^ column by A-j and 
the collection of rows (columns) corresponding to the g^^ group by Aj^]. 
{A.^g-^). The transpose of a matrix A is denoted by A' and its Frobenius 
norm by The symbol A^'^ is used to denote the concatenated matrix 

: • • • : A^^^ . Further, for notational convenience, we reserve the symbol 
||.|| to denote the £2 norm of a vector and/or the spectral norm of a matrix. 
Any other norm will be indexed explicitly (e.g., ||.||i, || -112,2; IM|2,oo) to avoid 
confusion. Also for any vector (3, we use (3j to denote its j^^ coordinate and 
to denote the coordinates corresponding to the g^^ group. 

2.2. Network Granger causal (NGG) estimates with group sparsity. Con- 
sider n replicates from the NGC model (2.1), and denote the nxp observation 
matrix at time t by X*. For example in a panel- VAR setting, the data on 
p economic variables on n subjects (firms, households etc.) can be observed 
over T time points. The data is high-dimensional if either T or p is large 
compared to n. In such a scenario, we assume the existence of an underlying 
group sparse structure, i.e., the support of each row of A^''^ = ^A^ : • • • : A-^] 
in the model (2.1) can be covered by a small number of groups s, where 
s <C (r — 1)G. Note that the groups can be misspecified in the sense that 
the coordinates of a group covering the support need not be all non-zero. 
Hence, for a properly specified group structure we shall expect s <C ||A[;"^||o- 
On the contrary, with many misspecified groups, s can be of the same order, 
or even larger than ||Aj.'^||o. 

The group Granger causal estimates of the adjacency matrices A , . . . , 
are obtained by solving the following optimization problem 

2 



2.2) A^''^ ^ = argmin^i^^2_ ^^T-i 



1 

Xp 



2n 



-Y^X^-' [A')' 

t=\ 

T-l p G 
t=l i=l 3=1 
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where A"* is the nx p observation matrix at time t, constructed by stacking 
n i.i.d. rephcates from the model (2.1), is a p x G matrix of suitably 
chosen weights, and is a truncating or thresholding factor, for every t. 
This optimization problem can be separated into the following p different 
penalized regression problems - for i = 1, . . . ,p, 

(2.3) ir-'= argmin - j^T-t^tg 

e^,9^,...,eT-imp ^ 



T-l G 
t=l 9=1 



2- 



The order d of the VAR model is estimated as d = max |t : A* 7^ 0|. 

l<t<T-l 

Different choices of weights wj.g and truncating/thresholdings factor 
introduce different variants of NGC estimates: 



1. Regular: The regular NGC estimates correspond to the choices = 
1, to* = 1 or a/AJq- The estimation procedure requires solving p group 



■'i,g — ^ V '^9- 

lasso penalized regression problems, as described in Section 3. Esti- 
mation and selection properties of the estimates are discussed in Sec- 
tion 4.1 under different choices of tuning parameter A„ and weights 
w^. In practice, can be tuned through cross-validation, that showed 
promising results in our numerical work. 

2. Adaptive: The adaptive version of NGC estimates corresponds to 
the choices w\ ^ = min{l, || II2 ""^j, where are the estimates from 
Regular NGC. This variant of NGC involves a two-stage estimation 
procedure. In the first stage, only estimates of the adjacency matrices 

are obtained, but not of the order d. The second stage uses the 
first-stage estimates to select weights w\ ^ and yields an improved rate 
of false positives. The algorithm requires solving p adaptive group 
lasso problems. The adaptive NGC estimation procedure requires a 
single tuning parameter A„ which is selected in the same way as in 
regular NGC. Consistency of adaptive group NGC estimates rely on 
the consistency of adaptive group lasso estimates [cf. Wei and Huang, 
2010]. 

3. Thresholded: Thresholded NGC estimates are also calculated by 
a two-stage procedure. The first stage involves a regular NGC esti- 
mation procedure, while at the second stage, bi-level thresholding is 
used. At first, the estimated groups with £2 norm less than a threshold 
{6grp = tX, t > 0) are set to zero. The second thresholding (within 
groups) is applied if the a priori available grouping information is not 
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reliable. The members within each estimated parent group are thresh- 
olded using Smisspec — for sonie 6n G (0,1). Mathenicitica-lly, for 
every t = 1, . . . , T - 1, if j G C/g, 



Ah 



4^- 



> 6. 



mtsspec 



4* 



>6. 



4. Truncating: A truncating variant of NGC estimates encourages ac- 
curate order selection in NGC problems, if the Granger causal effects 
decay over time. Truncating NGC estimates are obtained by solving 
a non-convex optimization problem via an iterative procedure based 
on a Block- Relaxation algorithms suggested in Shojaie and Michailidis 
[2010b]. This variant corresponds to the choices 
^1 = 1, = exp [An/IEg'Li ^{||A-^||o>o} < G^(^/{T - t)}], t > 2. 
Consistent estimation and selection properties of truncating NGC es- 
timates (without any group structure) were discussed in [Shojaie and 
Michailidis, 2010b] under a decay assumption on the Granger causal 
effects. Similar properties can be established using the consistency of 
regular group NGC estimates discussed in Section 4, but are not pur- 
sued in this paper. 



3. Assumptions and Conditions. Note that to obtain the solution 
of the NGC problem, one needs to solve for each i = 1, . . . ,p a generic group 
lasso problem of the form 



(3.1) 



{l,...,p} 



Uf=ia3, \Gg\ = kg 

jn^l|Y-X/3«i + j;A,|| 



argmm 



'[9]ll2 



9=1 



with Y = X[,X = [X^ : ■■■ : X^-^], (3° = T;ec(4^^ 0' P = ' ^)P^ 
G={T- l)G and \g = \ nWi,g- For casc of presentation, in the remainder 
we use p instead of p and G instead of G when examining the properties of 
the above problem. 

Next, we introduce assumptions needed for establishing norm and variable 
selection consistency for estimators of (3.1). Specifically, for norm consis- 
tency, group variants of compatibility and restricted eigenvalue conditions 
are used, while selection consistency relies on group irrepresentable ones. 
Further, for the problem at hand, we establish a connection between group 
irrepresentable and group compatibility conditions (Appendix D). 

Note that selection consistency of group lasso estimators involves both 
group-level, as well as within-groups selection consistency. Furthermore, due 



imsart-aos ver. 2011/11/15 file: draft-v3.tex date: November 5, 2012 



8 



BASU, SHOJAIE AND MICHAILIDIS 



to its inability to perform within group variable selection, group lasso esti- 
mates are not sign consistent whenever the groups are misspecified. Towards 
this end, the notion of "direction consistency" is introduced (Section 3.1.1) 
and the necessity of group (weak) irrepresentable conditions is established 
(Appendix D). 

3.1. Direction Consistency and Irrepresentable Conditions. 

3.1.1. Direction Consistency. As discussed in the introductory section, 
lasso estimates exhibit the right sparsity pattern and corresponding signs of 
the support variables with high probability. However, group lasso achieves 
sparsity at the group level [Huang et al., 2009], but not necessarily within 
the group itself. Hence, within group selection consistency is still unclear 
and several alternative penalized regression procedures have been proposed 
to overcome this shortcoming [Breheny and Huang, 2009, Huang et al., 2009, 
Zhao, Rocha and Yu, 2009]. We formulate a generalized notion of sign consis- 
tency, henceforth referred as "direction consistency", that provides insight 
into the properties of group lasso estimates within a single group. Subse- 
quently, these properties are used in a simple thresholding variant of the 
group lasso estimates that achieves within group variable selection consis- 
tency. 

Consider a generic group lasso estimate as in (3.1). Without loss of gen- 
erality, let S = {l,...,s}, and denote the group indices by support{(3^), 
i.e., 

/30 = [/3[l],...,/30],0,...,0], Pfg^^oy gGS = {l,...,s}, Y.^g = q. 

g&S 

For a vector r G M'"\{0} we define the following quantities: D{t) = 
and -D(O) = 0. In general, the function D{ ) indicates the direction of the 
vector r in M™. Specifically, for the problem at hand, for a group g ^ S of 
size m, L'(/3|^j) indicates the direction of influence of /3j^j at a group level 
as it reflects the relative importance of the influential group members. Note 
that for m = 1 the function D{-) simplifies to the usual sgn{-) function. 

We define an estimate (3 as direction consistent at a rate 5n, if there 
exists a sequence of positive real numbers (5„ — t- such that 

(3.2) P (p(4]) - D{Pl^)\\2 < Sn, V5 e 5) ^ 1 as n,p ^ oo. 

It readily follows from the definition, that if f3 is direction consistent and 
S*!? = jj G Q„ : ,1 l^^li > 5n} denotes a collection of influential group mem- 
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bers within a group Gg, which are detectable with a sample size of n, then 

(3.3) F{sgn0j) = sgn{(3j), Vj G S^,yg G {1, . . . , s}) ^ 1 as n,p ^ oo. 

Remark 3.1. The latter observation connects the precision of group 
lasso estimates to the accuracy of a priori available grouping information. 
In particular, if the pre-specified grouping structure is correct, i.e., all the 
members within a group have non-zero effect, then for a sufficiently large 
sample size we have Sg = Qg and group lasso correctly estimates the sign 
of all the coordinates. On the other hand, in case of a misspecified a pri- 
ori grouping structure, in the form of numerous zero coordinates, j3g, group 
lasso correctly estimates only the signs of strongly influential group members 
detectable with sample size n . 

3.1.2. Group Irrepresentable Conditions. Irrepresentable conditions are 
common in the literature of high-dimensional regression problems [Zhao and 
Yu, 2006, van de Geer and Biihlmann, 2009] and are shown to be sufficient 
(and essentially necessary) for selection consistency of the lasso estimates. 
Further these conditions are known to be satisfied with high probability, if 
the population analogue of the Gram matrix belongs to the Toeplitz fam- 
ily. Specifically, if the predictor variables in a group lasso regression problem 
are generated from an AR process, the design matrix satisfies irrepresentable 
conditions with high probability. Since we are working with vector AR pro- 
cesses and the population analogue of the Gram matrix var(X.^''^) is block 
Toeplitz, the irrepresentable assumptions are natural candidates for studying 
selection consistency of the estimates. Next, we formulate group analogues 
of these conditions. 

Consider the setup of a group lasso penalized linear model in (3.1) with 
p regressors partitioned into G groups, of which only the first s groups (of 
total size q) exert non-zero signal (influence) on the response. We partition 
the design matrix and the coefficient vector into signal and non-signal parts 

(3.4) ^=[X(i): X(2) ] 

nxp , 

nxq nx{p-q) 

(3.5) =[/3[l],...,/30],0,...,0] = [/3['i):/3(%] 



fci+...+fc.=g 

Cll Ci2 

C21 C22 



(3.6) C = -X'X 

n 
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Also, for a (7-dimensional vector define tlie stacked direction vectors 



(3.7) £(r) 

gx 1 





K 
















Uniform Irrepresentable Condition is satisfied if tliere exists < r/ < 1 
sucli tliat for all r G M'^ with ||t||2 00 



max rroi 2 < 1 
i<g<s 



(3.8) 



[9] 



<l-r],yg^S = {!,..., s} 



Weak Irrepresentable Condition is satisfied if 



(3.9) 



1 

a; 



[9] 



< 1, yg^S = {l,...,s} 



Note that these definitions revert to usual irrepresentable conditions for lasso 
estimates when all groups correspond to singletons. 

3.2. Group Restricted Eigenvalue Condition and Group Compatibility Con- 
dition. Restricted eigenvalue conditions [Bickel, Ritov and Tsybakov, 2009] 
ensure minimax optimal £2 estimation error in several penalized regres- 
sion problems van de Geer and Biihlmann [2009], while the analogue for 
group lasso problems is introduced in Lounici et al. [2011]. In the regression 
framework of (A.l), RE(s, L) is satisfied, if there exists a positive number 
(pRE = 4'Re{s) > such that 



(3.10) min 

JCNg,\J\<s 
AeIRP\{0} 



XA 

nllA 



[J] I 



: J;A,||A[,]||<L^A,||A[,]| 



> (pRE 



Oracle inequalities for consistency of group lasso estimators in ^2,1 norms 
under a RE(s, 3) assumption and consistency in £2 norms under an RE(2s, 
3) assumption are discussed in Lounici et al. [2011]. 

Following van de Geer and Biihlmann [2009], we introduce a slightly 
weaker notion called Group Compatibility (GC). For a constant L > 
we say that GC(S, L) condition holds, if there exists a constant 
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4'compatible — 4'compatible{S , L) > SUch that 



(3.11) 





^ 4^compatible 



g&S 



This notion is used to connect the irrepresentable conditions to the consis- 
tency results of group lasso estimators in £2,1 norms. The fact that GC(S, 
L) holds whenever RE(s, L) is satisfied follows directly from the Cauchy 
Schwarz inequality. 

4. Main Results. As discussed earlier, a number of authors have inves- 
tigated the norm consistency of generic group lasso estimates under differ- 
ent assumptions, and asymptotic regimes [Bach, 2008, Nardi and Rinaldo, 
2008, Wei and Huang, 2010, Lounici et al., 2011]. In particular, Lounici 
et al. [2011] establish the norm consistency of group lasso estimates under 
restricted eigenvalue assumptions. Of main interest, is to derive conditions 
that establish the validity of these assumptions in the context of NGC mod- 
els. This issue is addressed in Sections 4.1 and 4.2. Subsequently, employing 
the notion of direction consistency introduced in Section 3.1, we establish se- 
lection consistency of the generic group lasso estimate, and investigate both 
the group-level and within group consistency of thresholded group lasso es- 
timates for NGC. 

4.1. Norm consistency of generic group lasso estimates. We start by pre- 
senting for the NGC framework independent derivations of the results es- 
tablished in Lounici et al. [2011], under slightly different choices of tuning 
parameters and assumptions. Asymptotically both estimates share the same 
convergence rate. However, we use a compatibility condition analogous to 
the one in van de Geer and Biihlmann [2009], instead of RE{s,3) assump- 
tion of Lounici et al. [2011], to derive finite sample estimation error bounds 
in the ^2,1 norm. 

Proposition 4.1. Suppose the GC condition (3.11) holds with L = 3. 
Choose a > and denote Xmin = mini<g<G \- If 



for every g S Ng, then, the following statements hold with probability at 
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least 1-2G 



l-a 



(4.1) 



(4.2) 



X 



< 



16 



< 



^compatible g=\ 

16 T.U 



^compatible mm 



If, in addition, RE(2s, 3) holds, then, with the same probability we get 



(4.3) 



< ,2 



4^10 e:=iA^ 



'RE 



(2s) A, 



The result shows that group lasso achieves faster convergence rate than 
lasso, if the groups are appropriately specified. Note that if all groups are 
of equal size k and Xg = X for all g, then group lasso has an £2 estima- 
tion error of order O [^/s{Vk + ^/^og~G) / ^/n] . In contrast, lasso's error is 



Y^ll/S'^llo log p/n, which establishes that group lasso has a lower error bound 
if s <C ll/3'^llo- On the other hand, lasso will have a lower error bound if 
s X ||/3''||o) i-e., if the groups are highly misspecified. 

Next, we investigate when the restricted eigenvalue and compatibility 
conditions hold. Raskutti, Wainwright and Yu [2010], Rudelson and Zhou 
[2011] discuss the RE assumption for lasso for different families of random 
design matrices and error distributions. In particular, Raskutti, Wainwright 
and Yu [2010] show that the restricted eigenvalue condition for lasso holds 
with high probability if the sample size is large enough (n ^ q log p) and 
the minimum eigenvalue of the covariance matrix of each row of the design 
matrix (i.e. AminC^)) is bounded away from 0. The following is an adaptation 
of that result, tailored to group lasso regression. 



Proposition 4.2. Consider a generic group lasso regression (3.1) with a 
Gaussian random design matrix X G W^^'P whose rows are i.i.d. N{Q,Ti). If 
T?-^"^ satisfies RE{s, 3) with a constant (p^^ (which holds trivially if Amin{^) > 
0), then there exist universal positive constants c,c',c", such that if the sam- 
ple size n satisfies 



n > c 



„16/j2(S) fs{VWG+y%)' 



Xminj Xn 



where p^(E) 



max S 

1<9<G " 



then X also satisfies RE(s,3) with (j)RE/8 with probability at least 1 
c' exp(— cn). 
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4.2. Norm Consistency of Group NGC estimates. In view of the above 
results, norm consistency of the regular group NGC estimates holds un- 
der an appropriate asymptotic regime, if both the restricted eigenvalue and 
group compatibility conditions are satisfied with high probability. The fol- 
lowing result, together with Proposition 4.2, achieves this objective. Specif- 
ically, it shows that for a regular NGC estimation problem (2.3), Amm(S) 
is bounded away from 0, as long as the underlying VAR model is stable 
[cf. Liitkepohl, 2005], with its cross spectral density and the true adjacency 
matrices bounded above in spectral norm. 

Proposition 4.3. Consider a stable, stationary VAR(d) model of the 
form (2.1). Let S = Var{'S}'''^) and f{9), 9 G [— vr, vr] denote its cross 
spectral density. Suppose the spectral norm of the characteristic polynomial 
A{z) = I — A^z — J^z^ — ... — A!^z'^ evaluated on the circle \z\ = 1 is 
bounded above, i.e., 3M > such that ||^(e"*^)|| < M, 6 £ [-TT,n]. Then 
Amm(S) > jj. In particular this is satisfied when m := max^ 11^*11 < co, for 

some m > 0. 

Corollary 4.4. If the maximum incoming and outgoing effects at every 
node are bounded above, i.e., if 

dp dp 

(4.4) Yin = max V V J < oo, Vout = max V V |A- I < oo 

t=l j=l —'-^t=li=l 

then Amm(S) is bounded away from 0. 

Proof. This corollary is a simple consequence of the above proposition 
together with the following result relating different norms for a matrix, [see 
e.g. Golub and Van Loan, 1996, Cor 2.3.2], 

\\A%<y^\\A%\\A^\\^< ll-^'lli + ll-4*lloo i^i^...^^ 
and the definitions 



p p 

it 



L4* 1 = max L4-J, L4* Lo = max A 

j=l 1=1 



□ 



The following theorem is an immediate corollary of the above results. 
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Theorem 4.5. Consider a NGC estimation problem (2.3). Suppose the 
common design matrix X" = [X^ : • • • : A'"^""'^] in the p regression problems 
satisfy RE{2s,3) with s = maxj|paj|, where pai denotes the set of parent 
nodes of Xj in the network. Consider the asymptotic regimes G >i n"", a > 



and S = 0{n^^),kmax = 0(n^^), < Ci,C2 < 1 such that y/s{yjkmax + 

\/log G)/^/n = o(l). Then for a suitably chosen sequence of A„ we have 



A 



l:d 



A 



l:d 



in probability, as n,p 



oo. 



4.3. Selection consistency for generic group lasso estimates. Next, we 
discuss the selection consistency properties of a generic group lasso regres- 
sion problem with a common tuning parameter across groups, i.e., Xg = X 
for every g G Ng- Similar results can be obtained for more general choices 
of the tuning parameters. 

Theorem 4.6. Assume that the group uniform irrepresentable condition 
holds with 1 — rj for some rj > 0. Then, for any choice of 



X > 



Sn > 




max 



XVs\\{Cii)-' 



+ 0- 



11. 



/kg + Va log G) 



n 



with probability greater than 1 
1- As] = for all g ^ S, 



AG there exists a solution j3 satisfying 



2. 



< 5n \\Plg]\\, and hence I?(/3[g]) - D{l39.) < 26n, for all 



g £ S. If 6n < ^, then (3[gj ^ for all g £ S. 

Remark 4.7. The tuning parameter X can be chosen of the same order 
as required for I2 consistency to achieve selection consistency within groups 
in the sense of (3.3). Further, with the above choice of X, Sn can be chosen 
of the order of 0{y/s{-\/kmax + \/log G)/y/n). Thus, group lasso correctly 
identifies the group sparsity pattern if \/s{\/kmax + V^og G^jy/n — t- 0, the 
same scaling required for I2 consistency. 



Note that, the second part of the Theorem 4.6 also shows that group 
lasso estimates are direction consistent under the same scaling and hence 
a thresholded version of the estimates selects all important variables with 
high probability, as discussed in section 4.4. It can be shown that the weak 
irrepresentable condition is necessary for direction consistency of the group 
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lasso estimates under mild regularity conditions on the design matrices. In 
addition, analogously to the result in [van de Geer and Biihlmann, 2009], it 
can be shown that a slightly stronger version of the uniform ir represent able 
condition implies group compatibility conditions for group lasso estimates. 
We refer to Appendix D for a detailed discussion of these connections. 

4.4. Thresholding in Group NGC estimators. As described in Section 
2.2, regular group NGC estimates can be thresholded both at the group and 
coordinate levels. The first level of thresholding is motivated by the fact 
that lasso can select too many false positives [cf. van de Geer, Biihlmann 
and Zhou [2011], Zhou [2010] and the references therein]. We propose a hard- 
thresholding of regular group NGC estimates using a threshold 6grp = CX 
for some suitably chosen constant C. The second level of thresholding em- 
ploys the direction consistency of regular group NGC estimates to perform 
within group variable selection with high probability. At this level, we hard- 
threshold a coordinate j G Qg to zero if the corresponding coordinate of 
is lower than a threshold 5„ E (0,1) in absolute value. In view of 
Theorem 4.6, the within group thresholding selects the group members with 
strong enough signal relative to other members of that group. The follow- 
ing result demonstrates the benefit of these two types of thresholding. Note 
that the thresholding at group level relies only on a weak GC(S, 3) condi- 
tion, while the within group thresholding requires a stronger ir represent able 
condition. 

Theorem 4.8. Consider a generic group lasso regression problem (3.1) 
with common tuning parameter Xg = X. 

i) Assume the GC(S, 3) condition of (3.11) holds with a constant (j) = 
(j^compatibie and define 



If S = {g £ Ng : ^Igf^ + 0}, then \S\S\ < with probability at 

least 1 - 2G1-". 



a) Assume that uniform irrepresentable condition holds with 1 — rj for 
some r] > 0. Choose X and 5n as in Theorem 4-6 and define 



4G^ " i/ minjg^„pp(^o) \(3'^\ > 26n\\f39A\ for all j £ Qg, i.e., the effect 



of every non-zero member in a group is "visible" relative to the total 
effect from the group. 



nthgrp _ o 



A9]1||4,|1>4A- 
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Fig 2: Estimated adjacency matrices of a misspecified NGC model: (a) True, 
(b) Lasso, (c) Group Lasso, (d) Thresholded Group Lasso 

In NGC settings, where information about the temporal decay of the 
edge density of the network is available, a third level of thresholding is 
useful. Specifically, one can shrink to zero all the coefficients for each time 
lag where the total number of edges does not exceed a prespecified threshold 
that takes into account a predefined probability for false negatives. In this 
work, we do not further pursue such estimators. 

5. Performance Evaluation. We evaluate the performances of regu- 
lar, adaptive and thresholded variants of the group NGC estimators through 
an extensive simulaiton study, and compare the results to those obtained 
from lasso estimates. A standard R package (grpreg [Breheny and Huang, 
2009]) was used to obtain the estimates. 

The settings considered are: 

1. Balanced groups of equal size: The parameters are as follows: i.i.d sam- 
ples of size n = 60, 110, 160 are generated from lag-2 {d = 2) VAR 
models on T = 5 time points, comprising of p = 60, 120, 200 nodes 
partitioned into groups of equal size in the range 3-5. 

2. Unbalanced groups: In this case, the corresponding node set is parti- 
tioned into one larger group of size 10 and many groups of size 5. 
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3. Misspecified balanced groups: The parameters are as follows: i.i.d sam- 
ples of size n = 60, 110, 160 are generated from lag-2 (d = 2) VAR 
models on T = 10 time points, comprising of p = 60, 120 nodes par- 
titioned into groups of equal size 6. Further, for each group there is 
a 30% misspecification rate, namely that for every parent group of a 
downstream node, 30% of the group members do not exert any effect 
on it. 

The choice of the best tuning parameter A is based on a grid search in 
the interval [CiAe,C2Ae] where Ae = y^2 log p/n for lasso and y^2 log G/n 
for group lasso, using a 19 : 1 sample-splitting. The thresholding parameters 
are selected as Sgrp = 0.7\a at the group level and 6misspec = n""'^ within 
groups. Finally, within group thresholding is applied only when the group 
structure is misspecified. 

The following performance metrics were used for comparison purposes: 
Precision = TP /{TP + FP) , (ii) Recall = TP/ {TP + FN) and (iii) 
Matthew's Correlation coefficient (MCC) defined as 

{TP X TN) - {FP X FN) 
{{TP + FP) X {TP + FN) X {TN + FP) x {TN + FiV))V2 

where TP, TN, FP and FN correspond to true positives, true negatives, 
false positives and false negatives in the estimated network, respectively. 

The results for the balanced settings are given in Table 1. The average 
and standard deviations (in parentheses) of the performance metrics are 
presented for each setup. The Recall for p = 60 shows that even for a 
network with 60 x (5 — 1) = 240 nodes and \E\ = 351 true edges, the group 
NGC estimators recover about 71% of the true edges with a sample size as 
low as n = 60, while lasso based NGC estimates recover only 31% of the true 
edges. The three group NGC estimates have comparable performances in all 
the cases. However thresholded lasso shows slightly higher precision than the 
other group NGC variants for smaller sample sizes (e.g., n = 60, p = 200). 
The results for p = 60, n = 110 also display that lower precision of lasso 
is caused partially by its inability to estimate the order of the VAR model 
correctly, as measured by ERR LAG=Number of falsely connected edges 
from lags beyond the true order of the VAR model divided by the number 
of edges in the network (l-E"]). This finding is nicely illustrated in Figure 2 
and Table 1. The group penalty encourages edges from the nodes of the 
same group to be picked up together. Since the nodes of the same group 
are also from the same time lag, the group variants have substantially lower 
ERR LAG. For example, average ERR LAG of lasso for p = 200, n = 160 
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is 19.79% while the average ERR LAGs for the group lasso variants are in 
the range 3.06% - 4.21%. 

The results for the unbalanced networks are given in Table 2. As in the 
balanced group setup, in almost all the simulation settings the group NGC 
variants outperform the lasso estimates with respect to all three performance 
metrics. However the performances of the different variants of group NGC 
are comparable and tend to have higher standard deviations than the lasso 
estimates. Also the average ERR LAGs for the group NGC variants are 
substantially lower than the average ERR LAG for lasso demonstrating the 
advantage of group penalty. Although the conclusions regarding the com- 
parisons of lasso and group NGC estimates remain unchanged it is evident 
that the performances of all the estimators are affected by the presence of 
one large group, skewing the uniform nature of the network. For example the 
MCC measures of group NGC estimates in a balanced network with p = 60 
and \E\ = 351 vary around 97 — 98% which lowers to 89% — 90% when the 
groups are unbalanced. 

The results for misspecified groups are given in Table 3. Note that for 
higher sample size n the MCC of lasso and regular group lasso are com- 
parable. However, the thresholded version of group lasso {6misspec = n~^''^ 
used for within group selection) achieves significantly higher MCC than the 
rest. This demonstrates the advantage of using the directional consistency of 
group lasso estimators to perform within group variable selection. We would 
like to mention here that a careful choice of the thresholding parameters 
6grp and 5misspec via cross-validation or other model selection criteria indi- 
cate improvement in the performance of thresholded group lasso; however, 
we do not pursue these methods here as they require grid search over many 
tuning parameters or an efficient estimator of the degree of freedom of group 
lasso. 

In summary, the results clearly show that all variants of group lasso NGC 
outperform the lasso-based ones, whenever the grouping structure of the 
variables is known and correctly specified. Further, their performance de- 
pends on the composition of group sizes. On the other hand, if the a priori 
known group structure is moderately misspecified lasso estimates produce 
comparable results to regular and adaptive group NGC ones, while thresh- 
olded group estimates outperform all other methods, as expected. 

6. Application. 

6.1. Example: Banking balance sheets application. In this application, 
we examine the structure of the balance sheets in terms of assets and liabil- 
ities of the n = 50 largest (in terms of total balance sheet size) US banking 
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corporations. The data cover 9 quarters (September 2009-September 2011) 
and were directly obtained from the Federal Deposit Insurance Corporation 
(FDIC) database (available at www.fdic.gov). The p = 21 variables corre- 
spond to different assets (US and foreign government debt securities, equi- 
ties, loans (commercial, mortgages), leases, etc.) and liabilities (domestic and 
foreign deposits from households and businesses, deposits from the Federal 
Reserve Board, deposits of other financial institutions, non-interest bear- 
ing liabilities, etc.) We have organized them into four categories: two for the 
assets (loans and securities) and two for the liabilities (Balances Due and De- 
posits, based on a $250K reporting FDIC threshold). Amongst the 50 banks 
examined, one discerns large integrated ones with significant retail, commer- 
cial and investment activities (e.g. Citibank, JP Morgan, Bank of America, 
Wells Fargo), banks primarily focused on investment business (e.g. Gold- 
man Sachs, Morgan Stanley, American Express, E- Trade, Charles Schwab), 
regional banks (e.g. Banco Popular de Puerto Rico, Comerica Bank, Bank 
of the West). 

The raw data are reported in thousands of dollars. The few missing values 
were imputed using a nearest neighbor imputation method with k = 5, hy 
clustering them according to their total assets in the most recent quarter 
(September 2011) and subsequently every missing observation for a par- 
ticular bank was imputed by the median observation on its five nearest 
neighbors. The data were log-transformed to reduce non-stationarity issues. 
The dataset was restructured as a panel with p = 21 variables and n = 50 
replicates observed over T = 9 time points. Every column of replicates was 
scaled to have unit variance. 

We applied the proposed variants of NGC estimates on the first T = 6 
time points (Sep 2009 - Dec 2010) of the above panel dataset. The param- 
eters A and 6grp were chosen using a 19 : 1 sample-splitting method and 
the misspecification threshold Smisspec was set to zero as the grouping struc- 
ture was reliable. We calculated the MSE of the fitted model in predicting 
the outcomes in the four quarters (December 2010 - September 2011). The 
Predicted MSE (MSE for Dec 2010) are listed in Table 4. The estimated 
network structures are shown in Figures 3 and 4. 

It can be seen that the lasso estimates recover a very simple temporal 
structure amongst the variables; namely, that past values (in this case lag- 
1) influence present ones. Given the structure of the balance sheet of large 
banks, this is an anticipated result, since it can not be radically altered over 
a short time period due to business relationships and past commitments 
to customers of the bank. However, the (adaptive) group lasso estimates 
reveal a richer and more nuanced structure. Examining the fitted values 
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of the adjacency matrices A , we notice that the dominant effects remain 
those discovered by the lasso estimates. However, fairly strong effects are 
also estimated within each group, but also between the groups of the as- 
sets (loans and securities) on the balance sheet. This suggests rebalancing 
of the balance sheet for risk management purposes between relatively low 
risk securities and potentially more risky loans. Given the period covered 
by the data (post financial crisis starting in September 2009) when credit 
risk management became of paramount importance, the analysis picks up 
interesting patterns. On the other hand, significant fewer associations are 
discovered between the liabilities side of the balance sheet. Finally, there ex- 
ist relationships between deposits and securities such as US Treasuries and 
other domestic ones (primarily municipal bonds); the latter indicates that 
an effort on behalf of the banks to manage the credit risk of their balance 
sheets, namely allocating to low risk assets as opposed to more risky loans. 

It is also worth noting that the group lasso model exhibits superior pre- 
dictive performance over the lasso estimates, even 4 quarters into the future. 
Finally, in this case the thresholded estimates did not provide any additional 
benefits over the regular and adaptive variants, given that the specification 
of the groups was based on accounting principles and hence correctly struc- 
tured. 

6.2. Example: T-cell activation. Estimation of gene regulatory networks 
from expression data is a fundamental problem in functional genomics [Fried- 
man, 2004]. Time course data coupled with NGC models are informationally 
rich enough for the task at hand. The data for this application come from 
Rangel et al. [2004], where expression patterns of genes involved in T-cell 
activation were studied with the goal of discovering regulatory mechanisms 
that govern them in response to external stimuli. Activated T-cells are in- 
volved in regulation of effector cells (e.g. B-cells) and play a central role 
in mediating immune response. The available data comprising of n = 44 
samples of p = 58 genes, measure the cells response at 10 time points, 
t = 0, 2, 4, 6, 8, 18, 24, 32, 48, 72 hours after their stimulation with a T-ceh 
receptor independent activation mechanism. We concentrate on data from 
the first 5 time points, that correspond to early response mechanisms in the 
cells. 

Genes are often grouped based on their function and activity patterns 
into biological pathways. Thus, the knowledge of gene functions and their 
membership in biological pathways can be used as inherent grouping struc- 
tures in the proposed group lasso estimates of NGC. Towards this, we used 
available biological knowledge to define groups of genes based on their bi- 
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ological function. Reliable information for biological functions were found 
from the literature for 38 genes, which were retained for further analysis. 
These 38 genes were grouped into 13 groups with the number of genes in 
different groups ranging from 1 to 5. 

In Shojaie, Basu and Michailidis [2012], we analyzed this data and showed 
that the decay condition for the truncating lasso penalty seems to be vio- 
lated in this case, and considered instead estimation of regulatory effect 
using an adaptive thresholding penalty. Hence, we consider here only appli- 
cation of the adaptive and thresholding variants of the proposed group lasso 
estimator for NGC. Figure 5 shows the estimated networks based on lasso 
and thresholded group lasso estimates, where for ease of representation the 
nodes of the network represent groups of genes. 

In this case, estimates from variants of group NGC estimator were all 
similar, and included a number of known regulatory mechanisms in T-cell 
activation, not present in the regular lasso estimate. For instance. Waterman 
et al. [1990] suggest that TCF plays a significant role in activation of T-cells, 
which may describe the dominant role of this group of genes in the activation 
mechanism. On the other hand, Kim et al. [2005] suggest that activated T- 
cells exhibit high levels of osteoclast-associated receptor activity which may 
attribute the large number of associations between member of osteoclast 
differentiation and other groups. Finally, the estimated networks based on 
variants of group lasso estimator also offer improved estimation accuracy in 
terms of mean squared error (MSE) despite having having comparable com- 
plexities to their regular lasso counterpart (Table 5), which further confirms 
the findings of other numerical studies in the paper. 

7. Discussion. In this paper, the problem of estimating Network Granger 
Causal (NGC) models with inherent grouping structure is studied when 
replicates are available. Norm, and both group level and within group vari- 
able selection consistency are established under fairly mild assumptions on 
the structure of the underlying time series. To achieve the second objective 
the novel concept of direction consistency is introduced. 

The type of NGC models discussed in this study have wide applicability 
in different areas, including genomics and economics. However, in many con- 
texts the availability of replicates at each time point is not feasible (e.g. in 
rate of returns for stocks or other macroeconomic variables), while grouping 
structure is still present (e.g. grouping of stocks according to industry sec- 
tor). Hence, it is of interest to study the behavior of group lasso estimates 
in such a setting and address the technical challenges emanating from such 
a pure time series (dependent) data structure. 
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APPENDIX A: AUXILIARY LEMMAS 

Lemma A.l (Characterization of the Group lasso estimate). A vector 
P is a solution to the convex optimization problem 

1 ^ 
(A.l) argmin— ||y-X/3f + 



■all P[9]\ 

9=1 



if and only if (3 satisfies for some t £W with max^ rj^] < 1, 

1 



n L 



X'{Y - X/3) 



and r[g] = D {P[g] ] whenever /Sj^j ^ 0. 



Proof. The result follows directly from the KKT conditions for the op- 
timization problem (A.l). □ 

Lemma A. 2. Let Z ~ N{0, S) be a k-dimensional centered Gaussian 
random variable. Then, for any t > 0, the following concentration inequality 
holds: 

2t^ 



\Z\\ -EIIZIII > t] < 2 exp 
Further, E \\Z\\ < Vky^^\. 



vr2||S| 



Proof. The first inequality can be found in Ledoux and Talagrand [1991] 
(equation (3.2). To establish the second inequality note that, 



E||Z|| < yEllZf = [tr {ZZ')] = Vtr (S) < Vky^p^\ 

□ 

Lemma A. 3. Let /?, /3 G M'^\{0}. Denote by u = P - /S and r = D0) - 
D{(3). Then, if \\u\\ < 6 \\/3\\ we obtain \\r\\ < 26. 



Proof. It follows from \\u\\ < S\\(3\\, that 

(l-5)||/3||<||/3||-||n||<||/3||<|H| + ||/3||<(l 
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which imphes that 



< 



< 



< 6\\/3\\. Now, 

m\ + {u- 

/3(ll/3||-||/3||) + 



u 



<5 + 5 = 25 



□ 



APPENDIX B: RESULTS FOR NORM CONSISTENCY 



Proof of Proposition (4.1). Since /3 is a solution of the optimization 
problem (3.1), for all /3 E M^, we have 



1 ^1 ^ 

JY-Xpf + 2Y,\WM\\<-\\Y-Xpf + 2Y,\ 



9=1 



9=1 



Plugging in y = X/3 + e, and simplifying the resulting equation, we get 

G 

_ ||X(/3-/3°)||2 + -' 

n n 

G 

^9 



9=1 



G 



9=1 



^9] 



Fix g £Ng and consider the event = |e G M" : | (X'e 
that Z = -^X'e ~ iV(0,cj2C). So ~ N{0, a^C[g][g]). Then, 



< Agj. Note 



< 



l%ll > 2^9^"- 



% - EllZuilll > ^^-^T^AA:„A/llC| 



^[9] I 



where the last inequality follows from the second statement of Lemma A. 2. 

91 



Now, let Xg - 



2 ^9Y ll^[9][9]||- Then, for Xg > 0, if 
2exp ( ^ I < 2G- 
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{A'g) < 2G-°. 



we get 

But this happens if, 



which is ensured by the proposed choice of Xg. 

Next, define A := nf^^Ag. Then, P (^) > 1 - 2G^-", and on the event 
A, we have, for all (3 gW, 



V2x„ > \/ a log GTTa\/\\C\ 



1 ^ II 



9=1 



1 



0m|2 



<-\\x{f]-n\\ 



n 



G 



9=1 



ha] 



Note that 



^{9] 
A9] 



above by min{2 ||/3[g] || , 2 

eads to the following s 



vanishes if g ^ S and is bounded 
} if^G^. 



This leads to the following sparsity oracle inequality, for all /3 S R^, 

G 

"5] 



9=1 



1 



<-||X(/3-/30)f 
n 



(B.l) 



+4 ^ Xg min { ||/3[g] II , H^f^] || - \\P[9]\\} ■ 



The sparsity oracle inequality (B.l) with 13 = jS^ , and A := $ — leads to 
the following two useful bounds on the prediction and £2 i-estimation errors: 



(B.2) 
(B.3) 



- \\XAf < 4y A„ ||A 
n ^ " 



[9] I 



96S 



jZ^f ||A[g]|| < 3^Ag ||A[g]|| . 

9tS 9e5 



Now, assume the group compatibility condition 3.11 holds. Then, 

XA 1 

^9 

g&S V 9&S 



(B-4) ^llXAf <4J;A,||A[,]||< 



^ ^compatible 
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which impUes the first inequality of proposition 4.1. The second inequality 
follows from 

G 

9=1 3e5 

^ — E^^ 

W V " ^compatible ^compatible g^s 

where the last step uses (B.4). 

The proof of the last inequality of proposition 4.1, i.e., the upper bound on 
(.2 estimation error under RE(2s), is the same as in Theorem 3.1 in Lounici 
et al. [2011] and is omitted. □ 

Proof of Proposition 4.3. We note that S is a pTxpT block Toeplitz 
matrix with {i,jY^ block {^ij)i<i,j<T '■= r(i — j), where T{£)pxp is the 
autocovariance function of lag £ for the zero-mean VAR(d) process (2.1), 
defined as 

(B.5) r{i) = E[X*(X*-^)'] 

We consider the cross spectral density of the VAR(d) process (2.1) 

^ oo 

(B.6) /W = ^ E me-''', ee[-7r,7r] 

e=~oo 

From standard results of spectral theory we know that r(^) = f^^ e*^^ f{6) d9, 
for every i. 

We want to find a lower bound on the minimum eigenvalue of S, i.e., 
inf||2,||=i x'T,x. Consider an arbitrary pT-variate unit norm vector x formed 
by stacking the p-tuples x^, . . . , x^. 

For every 6 G [— vr, vr] define G{6) = Ylt=i ^* and note that 

G*{e)G{e)de = ^^(x*)'(x") T e^(*-")^de 

t=l r=l ■'-'^ 
T T 

= EE(^*)'(^') (2^i{t-}) 

t=l T=l 

T 

= 27r ^(a;*)'(x*) = 2tt ||xf = 27r 



t=i 
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Also let fi{9) be the minimum eigenvalue of the Hermitian matrix f{9)- 
Following Parter [1961] we have the result 

T T 



t=l T=l 

T T 

f-1 ^-1 \J-TT / 



t = l T=l 

G*{e)f{9) G{9)d9 
> I n{9){G*{9)G{9)) d9 

min ^i{9)\ I G* i9)G(9) d9 = 2tt min n{9) 



> 



So Amm(S) > 27r min fJ.{9). 

0e(-7r,7r) 

If A{z) = I — A^z — A?z'^ — ... — A'^z'^ is the (matrix-valued) charac- 
teristic polynomial of the VAR(d) model (2.1), then we have the following 
representation (see eqn (9.4.23), Priestley [1981]): 

f{9) = i-a2(A(e-^))-^(A(e-^))-^ 

Thus, 2^/i(0) = 27rA™„(/(0)) = 2^/A„,,(/(0)-i) > 1/ ||A(e-^^)||. But 
||A(e-*'')|| < 1+Ef=i 11^*11 for every ^ G [-vr, vr]. So the minimum eigenvalue 
of S is bounded away from zero as long as the spectral norms of the adjacency 
matrices are bounded above. 

□ 

APPENDIX C: RESULTS FOR SELECTION CONSISTENCY 

Proof of Theorem 4.6. Consider any solution Pr G W of the re- 
stricted group lasso problem 



(C.l) argmin ^ || Y - + 



[9] I 



and set /3 = : Oix(p-q) 
satisfies the statements of T 



. We show that such an augmented vector (5 
leorem 4.6 with high probability. 
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Let u = — /3^-^^ = — P^^y In view of lemmas A.l and A. 3, it is 
suffices to show that the following events happen with probability at least 
1 - 



(C.2) 
(C.3) 



U[g]\\ < 6r, 



, for all g £ S 



n 



[X'(e-X(i)u)]j^j <A, foran<7^S 



1 / 1 



Note that, in view of Lemma A.l, u = (Cn) ^^z/(;^) 

T &W with ||t[3]|| < 1 for ah 5 G 5, and Z = -^X'e 
for any g £ S, 



Xt] for some 



. Thus, 



^(1) • ^(2) 



""[sill > 



[9] 



< 



{a 



< 



(a 



11; 



v-l 



(1) 



[a] 



- Ar 



[9] 



l9] 



[9] 



ai h 



Note that V = ~ N{0,a^ (Cny^). So V^g^ ~ iV(0,f72ci' 

where s'^Jt^l := Also, by the second statement of lemma A. 2 we 



have E < a-sfkg 

Therefore, 



c 



11 



< 



Vr, 



\9\\ 



2 exp 



2 



(Cii) ^ ^s-a\\k. 



a 



11 



For the proposed choice of (5„, the above probability is bounded above by 
2G-°. 

Next, for any g ^ S, we get 
1 



n 



[X' {e - X(i)n)] 



[9] 



> A 



< 



[Zi2) - C^2iCr/Z(i)] II > V^A (1 - II [C2iC^,'t 



[9] 



Defining W = Z(2) - C2iCiYZ(i) ~ N{0,a^iC22 - CaiCf/Cis)), the uni- 
form irrepresentable condition implies that the above probability is bounded 
above by P (||^[g]|| > y/nXrj). 
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It can then be seen that W[g^ ~ -^(0, cT^C[g][g]), where C = C22—C2iC^iCi2 
denotes the Schur complement of C22- As before, lemma A. 2 establishes that 



< 2 exp 



[9M\ 



^Xr] - aJ kg\\C\ 



and the last probability is bounded above by 2G ° for the proposed choice 
of A. 

The results in the proposition follow by considering the union bound on 
the two sets of the probability statements made across all g G Nc- □ 

Proof of Theorem 4.8. We use the notations developed in the proof 
of Proposition 4.1. First note that, {ii) follows directly from Theorem 4.6. For 
(i), since the falsely selected groups are present after the initial thresholding, 
we get > 4A for every such group. Next, we obtain an upper bound 

for the number of such groups. Specifically, denoting A = ^ — z?", we get 



(C.4) S\S 



4A 4A 



Next, note that from the sparsity oracle inequality (B.2), the following 
holds on the event A, 



5;i|A[,]||<3j;||A[,]| 

giS gas 



It readily follows that 



48 

4 5]||A[,]||<3||A||2,i<-2.A 

where the last inequality follows from the ^2,1-error bound of (4.2). Using 
this inequality together with (C.4) gives the result. □ 

APPENDIX D: SUPPLEMENTS 

In this section, we discuss two results involving the compatibility and 
irrepresentable conditions for group lasso. The first result demonstrates a 
connection between irrepresentable conditions and compatibility conditions. 
The second result discusses the necessity of group irrepresentable conditions 
for direction consistency of the group lasso estimates. The proofs are given 
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under a special choice of tuning parameter = Xy^. Similar results can 
be derived for the general choice of A^, although their presentation is more 
involved. 

The following result is a generalization of Theorem 9.1 in van de Geer 
and Biihlmann [2009]. 

Proposition D.l. Suppose uniform irrepresentahle condition (3.8) holds 
with r] € [0, 1]. Then group compatibility (S, L) (3.11) condition holds when- 
ever L < j^. 

Proof. First note that with the above choice of A^ the Group Compat- 
ibility (5, L) condition simplifies to 
(D.l) 



Incompatible 



mm 

AgIRp\{0} 



Z^V^gW^Wl g(f:S gas 



Also, the uniform irrepresentahle condition guarantees that there exists 
< ?7 < 1 such that Vr G M''' with IItIIooo = max ||rr„i||2 < 1, we have, 

II II , l<g<s ''^ 



C21 (Cii)-i i^V 



[9] 



< 1 - 7? V5 ^ 5 



Here = K/X is a q x q block diagonal matrix with s diagonal blocks 
(D.2) 



^Ifcixfei, • • • , Vh'f-ksxks- Define 



argmm < 



g&S g<^S ) 



Note that -||XA0||^ 

nil iiz 



'^compatible/ 'i^ ^^'^ introduce two Lagrange multiphers 
A and A' corresponding to the equality and inequality constraints for solving 
the optimization problem in (D.2). Also, partition A*^ = A^-j^^ : A^2) 

X = [X(]^) : X(2)] into signal and nonsignal parts as in (3.6). The first q 
linear equations of the KKT conditions imply that there exists G W such 
that 



(D.3) 



Cll^(l) + C'i2A°2) 



XK^T 



0^0 
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and, for every g £ S, 

D(A»„)ifA»„^0 



^0 , 



It readily follows that {r'^fK^A^^^.^ = ^ y^HAf^jHa = 1. 

g&S 

Multiplying both sides of (D.3) by (A^-^-j)'^ we get 



^(2) - ^ 



(D.4) (aJ,)) CnAj,) + (aJ,)) Ci^AJ 

Also, (D.3) implies 

(D.5) AO + (Cn)-i C12AO = A {Cn)-' 



Multiplying both sides of the equation by (^K^t^^'^ = (^t^^'^ we obtain 



(D.6) 1 = - {t'Y {Cn)-' CuA^) + A {K^ry (Cn)"' (K°r°) 
Note that the absolute value of the first term, 



(D.7) 

is bounded above by 



T r 



9iS 



[9] 



(D. 



by virtue of the uniform irrepresentable condition and the Cauchy-Schwartz 
inequality. 

Assuming the minimum eigenvalue of Cn, i.e., Amm(Cii), is positive and 
considering ||i('''r'^||2 < y^, the second term is at most \ q/ A.min{Cii). So 
(D.6) implies 



(D.9) 



1 < (1-7?)L + 



In particular, A > A^m (Cii) (1 ~ (1 ~ v)^) /q positive whenever L < 
1/(1-77). 

Next, multiply both sides of (D.5) by (A^^^.^)'^ C21 to get 
(D.IO) 



(aJ^)) C2iA0i) + (a02)) C2i(Cn)-'qi2)A02) = A(A02)) ^21 (Cn)-i K 



-1 T^O^O 
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Using the upper bound in (D.8), the right hand side is at least — A(l — rj)L. 
Also a simple consequence of the block inversion formula of the non-negative 



definite matrix C guarantees that the matrix C22 — C21 {Cn 
negative definite. Hence, 



,-1 



C12 is non- 



and 



^(2) 
^(2) 



T r 



C, 



22 



C21 {Cu) C 



12 



A?2) > 



C22A(2) > 



aO 

^(2) 



C21 {C- 



11; 



C12AO 



(2) 



Putting all the pieces together we get 

1 



'^compatible 



n 



|XAO||i 



= A + + 022^1^) , by (D.4) 

> A-A(1-??)L , by (D.IO) 
= A(1-(1-7?)L) 

Plugging in the lower bound for A we obtain the result; namely, 





(2) 



compatible 



Amm(Cll)(l-(l-??)i)' >0 



for any L < 



□ 



D.l. Necessity of the Weak Irrepresentable Condition for direc- 
tion consistency. In this section we demonstrate the necessity of weak 
irrepresentable condition for group sparsity selection and direction consis- 
tency. We shall assume that the minimum eigenvalue of the signal part of 
the Gram matrix, viz. Amin{Cn) , is bounded below. We shall also assume 
that the matrices C21 and C22 are bounded above in spectral norm. Suppose 
that the weak irrepresentable condition does not hold, i.e., for some g ^ S 
and ^ > 0, we have. 



1 



C2iiCi,)-'K^D{/3f,^) 



[g] 



> 1 + e 



for infinitely many n. Also suppose that there exists a sequence of positive 
reals 5„ — )• such that the event 

En := - I)(/3[g])||2 < Sn, yg € S, and = OV5 ^ S} 

satisfies F{En) — )■ 1 as p, n — )• 00. 
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Note that for large enough n so that 5n < ming ||Z)(/3[g])||, we have / 
0, y g £ S on the event En- 
Then, as in the proof of Theorem 4.6, we have, on the event En, 



(D.ll) 
(D.12) 



u = (Cn) 



-1 



1 



and 



1 



n 



[X(2) (e-X(i)n)]j^j <Xy%,yg^S 



Substituting the value of u from (D.ll) in (D.12), we have, on the event En, 



[Z(2) - C2i(Cii)-iZ(i) + XVnC2i{Cur^K'^D0(i)) 
which implies that 

Z(2) - C21 (Cii) ^ 



l9] 



[9] 



(D.13) 



C2i(Cii)-1a'°Z)(/3(i)) 





- 1 


- [9] 





Now note that for large enough n, if ||C2i|| is bounded above, direction 
consistency guarantees that the expression on the right is larger than 



A Vn \ k, 



1 



C2i(Cii)-ii^°D(/3(i)) 



[9] 



which in turn is larger than | A \/n ^Jkg^, in view of the weak ir represent able 
condition. 

This contradicts P(-En) — ^ Ij since the left-hand side of (D.13) corresponds 
to the norm of a zero mean Gaussian random variable with bounded variance 
structure [C22 — C'2i(Cii)~^Ci2] j^jj^j while the right hand side diverges for 



A 



REFERENCES 



Bach, F. R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. 

Learn. Res. 9 1179-1225. MR2417268 (2010a:68132) 
BiCKEL, P. J., RiTOV, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso 

and Dantzig selector. The Annals of Statistics 37 1705-1732. 
Blanchard, O. and Perotti, R. (2002). An empirical characterization of the dynamic 

effects of changes in government spending and taxes on output, the Quarterly Journal 

of economics 117 1329-1368. 



imsart-aos ver. 2011/11/15 file: draft-v3.tex date: November 5, 2012 



NGC WITH INHERENT GROUPING STRUCTURE 



33 



Breheny, p. and Huang, J. (2009). Penalized methods for bi-level variable selection. 

Stat. Interface 2 369-380. MR2540094 (2010k:62290) 
Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models. 

Science's STKE 303 799. 
FuJiTA, A., Sato, J., Garay-Malpartida, H., Yamaguchi, R., Miyano, S., Soga- 

YAR, M. and Ferreira, C. (2007). Modeling gene expression regulatory networks with 

the sparse vector autoregressive model. BMC Systems Biology 1 39. 
GOLUB, G. H. and Van Loan, C. F. (1996). Matrix computations, third ed. Johns Hopkins 

Studies m the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD. 

MR1417720 (97g:65006) 
Granger, C. W. J. (1969). Investigating Causal Relations by Econometric Models and 

Cross-spectral Methods. Econometrica 37 424-438. 
HiEMSTRA, C. and Jones, J. D. (1994). Testing for linear and nonlinear Granger causality 

in the stock price-volume relation. Journal of Finance 1639-1664. 
Huang, J. and Zhang, T. (2010). The benefit of group sparsity. Ann. Statist. 38 1978- 

2004. MR2676881 

Huang, J., Ma, S., Xie, H. and Zhang, C.-H. (2009). A group bridge approach for 
variable selection. Biometrika 96 339-355. . MR2507147 

Kim, K., Kim, J. H., Lee, J., Jin, H. M., Lee, S. H., Fisher, D. E., Kook, H., 
Kim, K. K., Choi, Y. and Kim, N. (2005). Nuclear factor of activated T cells cl 
induces osteoclast-associated receptor gene expression during tumor necrosis factor- 
related activation-induced cytokine-mediated osteoclastogenesis. Journal of Biological 
Chemistry 280 35209-35216. 

Ledoux, M. and Talagrand, M. (1991). Probability m Banach spaces. Ergebnisse der 
Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 
23. Springer- Verlag, Berlin. Isoperimetry and processes. MR1102015 (93c:60001) 

LOUNICI, K., PONTIL, M., VAN DE Geer, S. and TSYBAKOV, A. B. (2011). Oracle in- 
equalities and optimal inference under group sparsity. Ann. Statist. 39 2164-2204. 

LozANO, A., Abe, N., Liu, Y. and Rosset, S. (2009). Grouped graphical Granger mod- 
eling for gene expression regulatory networks discovery. Bioinformatics 25 illO. 

LiixKEPOHL, H. (2005). New introduction to multiple time series analysis. Springer. 

Nardi, Y. and Rinaldo, A. (2008). On the asymptotic properties of the group lasso 
estimator for hnear models. Electron. J. Stat. 2 605-633. . MR2426104 (2009k:62175) 

Barter, S. V. (1961). Extreme eigenvalues of Toeplitz forms and applications to elliptic 
difference equations. Trans. Amer. Math. Soc. 99 153-192. MR0120492 (22 ##11245) 

Pearl, J. (2000). Causality: models, reasoning, and inference 47 . Cambridge Univ Press. 

Priestley, M. B. (1981). Spectral analysis and time series. Vol. 2. Academic Press Inc. 
[Harcourt Brace Jovanovich Publishers], London. Multivariate series, prediction and 
control, Probability and Mathematical Statistics. MR628736 (83b:62186b) 

Rangel, C, Angus, J., Ghahramani, Z., Lioumi, M., Sotheran, E., Gaiba, A., 
Wild, D. L. and Falciani, F. (2004). Modehng T-ceU activation using gene expression 
profiling and state-space models. Bioinformatics 20 1361. 

Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue proper- 
ties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241-2259. MR2719855 
(2011h:62272) 

RuDELSON, M. and Zhou, S. (2011). Reconstruction from anisotropic random measure- 
ments. Arxiv preprint arXiv:1106.1151vl. 

Shojaie, a., Basu, S. and Michailidis, G. (2012). Adaptive Thresholding for Recon- 
structing Regulatory Networks from Time-Course Gene Expression Data. Statistics in 
Biosciences 4 66-83. 10.1007/sl2561-011-9050-5. 



imsart-aos ver. 2011/11/15 file: draft-v3.tex date: November 5, 2012 



34 



BASU, SHOJAIE AND MICHAILIDIS 



Shojaie, a. and Michailidis, G. (2010a). Penalized Likelihood Methods for Estimation 
of Sparse High Dimensional Directed Acyclic Graphs. Biometrika 97 519-538. 

Shojaie, A. and Michailidis, G. (2010b). Discovering Graphical Granger Causality Us- 
ing a Truncating Lasso Penalty. Btoinformatics 26 1517-1523. 

Sims, C. A. (1972). Money, income, and causality. The American Economic Review 62 
540-552. 

VAN DE Geer, S. A. and Buhlmann, P. (2009). On the conditions used to prove oracle 

results for the Lasso. Electron. J. Stat. 3 1360-1392. 
VAN DE Geer, S., Buhlmann, P. and Zhou, S. (2011). The adaptive and the thresholded 

Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron. 

J. Stat. 5 688-749. . MR2820636 
Waterman, M., Jones, K. et al. (1990). Purification of TCF-1 alpha, a T-cell-specific 

transcription factor that activates the T-cell receptor C alpha gene enhancer in a 

context-dependent manner. The New biologist 2 621. 
Wei, F. and Huang, J. (2010). Consistent group selection in high-dimensional linear 

regression. Bernoulli 16 1369-1384. . MR2759183 
Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for 

grouped and hierarchical variable selection. Ann. Statist. 37 3468-3497. . MR2549566 

(2011c:62234) 

Zhao, P. and Yu, B. (2006). On Model Selection Consistency of Lasso. J. Mach. Learn. 
Res. 7 2541-2563. 

Zhou, S. (2010). Thresholded Lasso for high dimensional variable selection and statistical 
estimation. Arxiv preprint arXiv: 1002. 1583. 



SUMANTA BASU AND GEORGE MICHAILIDIS 

Department of Statistics 
University of Michigan 
Ann Arbor MI 48109 
E-MAIL: sumbose@umich.edu 
gmichail@umich.edu 



Ali Shciaie 

Department of Biostatistics 
University of Washington 
Seattle WA 98195 
E-MAIL: ashojaic@u. washington.edu 



imsart-aos ver. 2011/11/15 file: draft-v3.tex date: November 5, 2012 



NGC WITH INHERENT GROUPING STRUCTURE 



35 



Table 1 

Performance of different regularization methods in estimating graphical Granger causality 
with balanced group sizes and no misspecification; d = 2, T = 5, SNR = 1.8. Precision 
(P), Recall (R), MCC are given in percentages (numbers in parentheses give standard 
deviations). ERR LAG gives the error associated with incorrect estimation of VAR order. 





P = 


60, \E\ = 


351 


P = 


120, = 


1404 


P = 


200, \E\ = 


3900 




Group Sizc= 


=3 


Group Sizc= 


=3 


Group Size= 


=5 


n 


160 


110 


60 


160 


110 


60 


160 


110 


60 


Lasso 


80(2) 


75(2) 


66(4) 


69(1) 


62(2) 


52(2) 


52(1) 


47(1) 


38(1) 


Grp 


95(2) 


91(4) 


83(7) 


91(3) 


80(5) 


68(7) 


78(4) 


72(3) 


59(6) 


Thgrp 


96(1) 


92(3) 


86(6) 


93(3) 


83(5) 


70(7) 


82(4) 


76(3) 


64(6) 


Agrp 


96(2) 


92(4) 


83(7) 


92(3) 


82(5) 


69(7) 


81(3) 


74(3) 


60(6) 


Lasso 


71(2) 


54(2) 


31(2) 


54(1) 


40(1) 


22(1) 


38(1) 


28(1) 


15(1) 


Grp 


99(1) 


93(3) 


71(7) 


91(2) 


81(2) 


48(8) 


84(1) 


70(2) 


41(4) 


Thgrp 


99(1) 


93(3) 


71(7) 


91(2) 


81(2) 


48(8) 


84(2) 


69(2) 


41(3) 


Agrp 


99(1) 


93(3) 


71(7) 


91(2) 


81(2) 


47(8) 


84(1) 


69(2) 


40(4) 


Lasso 


75(2) 


63(2) 


45(3) 


60(1) 


19(1) 


33(1) 


43(1) 


35(1) 


23(1) 


Grp 


97(1) 


92(3) 


76(5) 


91(1) 


80(2) 


56(2) 


81(2) 


70(2) 


48(2) 


Thgrp 


98(1) 


93(2) 


78(5) 


92(1) 


81(2) 


57(3) 


83(2) 


72(2) 


50(3) 


Agrp 


97(1) 


92(3) 


76(5) 


91(1) 


81(2) 


56(3) 


82(2) 


71(2) 


48(2) 


Lasso 


10.5 


11.3 


13.9 


16.63 


17.37 


16.69 


19.79 


20 


18.52 


Grp 


3.19 


6.95 


12.76 


4.86 


10.77 


12.65 


4.21 


5.27 


7.8 


Thgrp 


2.83 


5.87 


10.01 


3.98 


9.03 


11.19 


3.06 


3.91 


5.68 


Agrp 


3.13 


6.89 


12.59 


4.63 


10.37 


12.34 


3.58 


4.87 


7.59 



T.VBLE 2 

Performance of different regularization methods in estimating graphical Granger 
causality with unbalanced group sizes and no misspecification; d = 2, T = 5, 
SNR — 1.8. Precision (P), Recall (R), MCC are given in percentages (numbers in 
parentheses give standard deviations). ERR LAG gives the error associated with incorrect 

estimation of VAR order. 







P = 


60, \E\ 


= 450 


p=120, |£;| = 


1575 


p = 200, lEI = 


4150 






Groups=l X 10, 11 X 5 


Groups- 


=1 X 10,23 X 5 


Groups 


=1 X 10,39 X 5 




n 


160 


110 


60 


160 


110 


60 


160 


110 


60 


p 


Lasso 


72(2) 


69(3) 


62(2) 


51(1) 


18(1) 


41(1) 


61(1) 


53(1) 


42(2) 




Grp 


84(4) 


79(6) 


76(9) 


55(5) 


47(5) 


40(6) 


86(3) 


77(5) 


66(7) 




Thgrp 


86(4) 


82(7) 


78(11) 


60(6) 


50(7) 


40(5) 


88(2) 


79(6) 


69(6) 




Agrp 


85(3) 


81(5) 


77(9) 


59(5) 


51(5) 


42(6) 


88(2) 


78(5) 


67(6) 


R 


Lasso 


45(2) 


35(2) 


22(2) 


43(1) 


34(1) 


22(1) 


23(1) 


15(0) 


7(0) 




Grp 


94(3) 


87(5) 


61(8) 


88(2) 


75(5) 


48(6) 


73(3) 


49(6) 


22(5) 




Thgrp 


95(2) 


88(4) 


62(8) 


89(3) 


77(4) 


50(5) 


73(3) 


50(6) 


21(5) 




Agrp 


94(3) 


87(5) 


61(8) 


88(2) 


75(5) 


48(6) 


73(3) 


49(6) 


22(5) 


MCC 


Lasso 


56(2) 


48(2) 


35(2) 


46(1) 


39(1) 


29(1) 


36(1) 


28(1) 


17(1) 




Grp 


89(3) 


82(4) 


67(5) 


68(3) 


58(3) 


42(3) 


79(1) 


61(3) 


37(3) 




Thgrp 


90(3) 


84(4) 


68(6) 


72(4) 


61(4) 


43(2) 


80(1) 


62(3) 


37(3) 




Agrp 


89(3) 


83(4) 


67(6) 


71(3) 


60(3) 


43(3) 


79(1) 


61(3) 


37(3) 


ERR 


Lasso 


10.59 


10.74 


11.76 


18.3 


18.72 


18.76 


11.54 


10.93 


9.29 


LAG 


Grp 


7.04 


9.85 


13.04 


12.53 


14.71 


13.06 


4.8 


6.41 


6.85 




Thgrp 


6.58 


8.98 


11.1 


9.6 


11.9 


10.9 


4.06 


5.65 


5.7 




Agrp 


6.74 


9.19 


12.96 


10.81 


12.78 


11.79 


4.55 


6.2 


6.81 
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Table 3 

Performance of different regularization methods in estimating graphical Granger causality 
with misspecified groups (30% mis specification); d = 2, T = 10, SNR = 2. Precision 
(P), Recall (R), MCC are given in percentages (numbers in parentheses give standard 

deviations). ERR LAG gives the error associated with incorrect estimation of VAR order. 







P = 


60, = 


246 


P = 


120, \E\ = 


968 






Group Size: 


=6 


Group Sizc= 


=6 




n 


160 


110 


60 


160 


110 


60 


p 


Lasso 


88(2) 


85(3) 


77(5) 


59(1) 


55(1) 


49(2) 




Grp 


65(2) 


66(2) 


66(3) 


43(3) 


44(4) 


38(4) 




Thgrp 


87(3) 


88(3) 


85(3) 


56(6) 


56(6) 


51(7) 




Agrp 


65(2) 


66(2) 


66(3) 


45(2) 


45(4) 


39(4) 


R 


Lasso 


80(3) 


63(3) 


37(2) 


66(1) 


54(1) 


35(1) 




Grp 


100(0) 


98(2) 


82(6) 


87(2) 


78(3) 


59(4) 




Thgrp 


100(0) 


98(2) 


79(6) 


86(2) 


79(3) 


57(4) 




Agrp 


100(0) 


98(2) 


82(6) 


86(2) 


78(3) 


58(3) 


MCC 


Lasso 


84(2) 


73(2) 


53(3) 


62(1) 


54(1) 


41(1) 




Grp 


81(1) 


80(2) 


74(4) 


61(2) 


58(3) 


47(2) 




Thgrp 


93(2) 


93(2) 


82(4) 


69(4) 


66(4) 


53(3) 




Agrp 


81(1) 


80(2) 


74(4) 


62(2) 


59(2) 


47(2) 


ERR 


Lasso 


12.63 


17.05 


22.41 


45.09 


49.68 


53.4 


LAG 


Grp 


9.43 


8.78 


15.12 


18.22 


18.43 


29.26 




Thgrp 


6.45 


5.34 


8.02 


11.81 


12.84 


15.57 




Agrp 


9.11 


8.78 


14.96 


16.32 


16.9 


27.69 



Table 4 

Mean and standard deviation (in parentheses) of PMSE (MSB in case of Dec 2010) for 
prediction of banking balance sheet variables. 



Quarter 


Lasso Grp Agrp Thgrp 


Dec 2010 
Mar 2011 
Jun 2011 
Sep 2011 


1.59 (0.29) 0.36 (0.05) 0.36 (0.05) 0.37 (0.05) 
1.46 (0.30) 0.47 (0.23) 0.47 (0.23) 0.46 (0.22) 
1.33 (0.26) 0.36 (0.11) 0.36 (0.11) 0.35 (0.11) 
1.72 (0.32) 0.50 (0.18) 0.50 (0.18) 0.47 (0.16) 



Table 5 

Mean and standard deviation of MSB for different NGC estimates 





Lasso Grp Agrp Thgrp 


moan 
stdev 


0.649 0.456 0.457 0.456 
0.340 0.252 0.251 0.252 



imsart-aos ver. 2011/11/15 file: draft-v3.tex date: November 5, 2012 



NGC WITH INHERENT GROUPING STRUCTURE 



37 



a 



c 



c 



c 



I 



C 

e 

c 

> 
o 
O 



o 



■a c 

, ^ T3 , — I 



D- Q on 



C 



:3 
a- 
W 



[In 



Oh 



a 



*i ^ ^ O 



o 



II 

V 



^ - 1 

O (U 3 

S P P 



V5- 



o 



o 

A 



o 



in 



c 



I 



O 



P 
p 

> 
o 
O 



o 



CD 

imsaH;-aos ver. 

CO 
CD 
O 

c 

to 
ca 



^ CJ 



2 



-a 
c 



o 



O 



o 



o 
ft. 



3 

a- 



T3 

p 



P 



c 



[In 



P 

.2 § 



fin 



O 

U 



CD 

E 
o 
o 
_c 

-I — ' , , 

C/3 
CD 

20iar/ll/15 file: draf-©-v3. tex date: 



II 
V 



o 
a. 



o 

V3- 



O 
Ok 



O 



3 
O 



O 



o 



p 
S 



3 
o 

CD 
CO 



CO 

c 
to 
o 



.a c -c 
I- o 

Noveml^ 5, 2012 

Th 
O 
Q. 
CD 
Q 



Fig 3: Estimated Networks of banking balance sheet variables using lasso. 
The network represents the aggregated network over 5 time points. 
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Fig 4: Estimated Networks of banking balance sheet variables using group 
lasso. The network represents the aggregated network over 5 time points. 
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Fig 5: Estimated Gene Regulatory Networks of T-cell activation. Width of 
edges represent the number of effects between two groups, and the network 
represent the aggregated regulatory network over 3 time points. 
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